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Profound data confusion is reported for the paper by Smykal et al. (Genomic diversity and macroecology of the crop wild 
relatives of domesticated pea. Scientific Reports 7, 17384; 2017) which challenges the validity of its scientific conclusion 
and the data reported. 
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Crop wild relatives can be an important source of genes useful for breeding so their genetic diversity acquires 
growing attention, especially in view of deterioration of their natural habitats because of human economic activity and 
climate change. Wild peas (Pisum L.) are not an exception in this respect as their genetic diversity, phylogeny and 
phylogeography is currently a subject of research worldwide. One of the most recent papers on the wild pea 
phylogeography was published in ‘Scientific Reports’ by Smykal et al. (2017). It is based on rich materials and involves 
advanced modern molecular and bioinformatic methods. Unfortunately, these efforts appear to be in vain since that 
paper shows signs of profound data confusion so that any of its conclusions should be taken with care. This sad 
circumstance became possible since the confusion could be revealed only by reviewers who worked with the same 
accessions, although a considerable degree of sloppiness in the arrangement of supplementary data could warn a 
reviewer as well. We cannot judge at what level(s) the confusion took place: seeds in germplasm collection, tubes in 
laboratory, sequences in database, etc. Below we, however, highlight evidence that the confusion did take place. This 
evidence came from the data on plastid genomes of some pea accessions which we have sequenced for our forthcoming 
phylogenetic analysis and submitted to GenBank / European Nucleotide Archive. Eleven accessions were involved in both 
our study and that by Smykal et al. (2017). Among them, we found only three cases of coincidence of the trnS-trnG 
plastidic spacer sequence in the genomes sequenced by us and those reported by Smykal et al. (2017). 

An important claim by Smykal et al. (2017) was revealing introgression of plastid DNA from Pisum sativum L. (a 
diverse species including wild and cultivated forms) to P. fulvum Sibth, et Smith, a distinct wild species confined to the 
Near East. This finding referred to an intergenic spacer between trnS and trnG. The haplotypes characteristic for P. sativum 
subsp. e/atius (Bieb.) Aschers. et. Graebn. were reported to be found in six of 149 accessions of P. fulvum, in particular in 
the accession WL2140 originating from the Valley of the Cross in Jerusalem. We have got this accession in 1990s from 
Norman Weeden at Cornell University and recently sequenced its plastid genome (MG458702.1). However, our data did 
not reveal any traces of that introgression, the spacer trnS-trnG of WL2140, as expected, was identical to that of accession 
706 (also P. fulvum), both obtained by us (MG458703.1) and by Smykal et al. (2017) (KU678952). The trnS-trnG spacer 
haplotype reported for WL2140 by Smykal et al. (2017) (KU679224) was identical to that found in the plastidic genome 
sequences NC_014057 and KJ806203 of the cultivated pea (P. sativum subsp. sativum), as well as to that obtained by us for 
JI1794 (HG966675), P. sativum subsp. e/atius s. |. (see below). These haplotypes differ by 5 nucleotide substitutions. Thus, 
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our conclusion is that at least one case of the introgression of plastid DNA of P. sativum into P. fulvum claimed by Smykal 
Smykal et al. (2017) is erroneous, and should be further confirmed in other cases. 

It is not excluded that Smykal et al. (2017) confused accessions WL2140 (P. fulvum) and JI2140 (P. sativum; traditional 
cultivar/landrace according to the John Innes Centre listing): the latter is mentioned in Supplementary data Table 
‘DARTseq-samples’ worksheet but, surprisingly, not mentioned in other pages of their supplementary material 
(41598_2017_17623_MOESM2_ESM.xIsx). 

Some other inconsistencies in the plant material assignment can be found. In page ‘P_sativum-elatius’ of the same 
supplementary table Smykal et al. (2017) correctly stated that accession 712 (designation by Ben-Ze’ev & Zohary, 1973) is 
the same as L100 (designation by Herbert Lamprecht), as follows: “712 (=JI3273)=L100". However, in the pages ‘ITS 
ribotypes’ and ‘trnSG haplotypes’ they appear independently with their own NCBI numbers. Moreover, the corresponding 
sequences of the trnS-trnG spacer represent different haplotypes. This is an obvious error. We (Bogdanova et al., 2015) 
have sequenced the plastid genome of L100 (HG966676.1) and the trnS-trnG haplotype appeared identical to that of 712 
(KU679018) but not of “L100” (KU679059) by Smykal et al. (2017). 

There are other discrepancies between the trnS-trnG spacer haplotypes reported by Smykal et al. (2017), and the 
plastid genome sequences obtained by us for the same accessions. For instance, we (Bogdanova et al., 2015) have 
sequenced (HG966675.1) an important accession JI1794 (= 714) involved in mapping experiments (Timmerman-Vaghan et 
al., 1996; Weeden et al., 1998). We are confident that no confusion was involved in our study, as we revealed the inversion 
with breakpoints in the spacers psal-accD and psbi-trnS predicted by Palmer et al. (1985) for this pea accession. The trnsS- 
trnG haplotype reported for this accession by Smykal et al. (2017) (KU679121) differed from that revealed by us 
(HG966675) by 1 nucleotide substitution, but was identical to the haplotype reported by the cited authors for accessions 
WL2140 (KU679224) (see above) and P1344537 (KU679200). 

Besides, the haplotypes of the trnS-trnG spacer in plastid genomes sequenced by us and reported by Smykal et al. 
(2017) differed for accessions Pis2853, Pis2845, W6 26109 (provided by Petr Smykal), JI3557 (provided by us to him) and 
P1344537, and coincide for accession 1G64350 (provided by him to us). 

The arrangement of supplementary materials (41598_2017_17623_MOESM2_ESM.xlsx) to the cited paper hints that 
more confusion was involved. An unexpected appearance of a cultivated pea accession JI2140 (in the paper devoted to 
wild peas) in one of its pages is mentioned above. All nine accessions for which “VIR, Russia” (this means N.I. Vavilov All- 
Russian Institute of Plant Genetic Resources, Saint-Petersburg, Russia) as indicated in the ‘DARTseq-samples’ page as the 
source have prefices never used in VIR: 711 713, 714, 721, 722, PO14, PO16, PO17, L100. At the same time, in the page ‘ITS 
ribotypes’, L100 was marked as obtained from JIC, while the rest and some more accessions including 712 - from 
‘Novosibirsk, Russia’, that no doubt means from our lab. Their latter should be correct as we did exchange wild pea 
germplasm. This circumstance is important to rule out that the above mentioned confusions were not due to long and 
unknown history of accessions in germplasm collections but concerned germplasm recently and directly exchanged and 
at least some germplasm analysed by Smykal et a/. (2017) should have been derived from our stocks. Also VIR does not 
have pea accessions VIR10-VIR30 mentioned in the supplementary materials as P. e/atius; most likely, the author used the 
ordinal numbers from the list of sent accessions (M.A. Vishnyakova, pers. comm.). 

To avoid such sad cases as the paper by Smykal et al. (2017) two things seem desirable: (i) to nominate reviewers 
acquainted not only in the methods but as well as the biological material involved and (ii) to thoroughly review not only 
the paper itself but also the arrangement of supplementary materials, which are indispensable and may the most 
important part of a publication as they contain the actual data. 

This refutation was submitted to ‘Scientific Reports’ on 2.03.2017 and rejected on 12.07.2018,. In his reply Dr. P. 
Smykal in particular noted that the above discrepancy could partly result from heterogeneity of accessions (not at all 
discussed in their paper), that VIR samples were taken from VIR Herbarium (not indicated in their paper) and did not 
exclude confusion of WL2140 and JI2140). 

The datasets analysed during the current study are available from the GenBank / ENA repository 
https://www.ncbi.nlm.nih.gov/; the accession numbers are mentioned in the text. 
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